Skip to content

Update to transformers v5#30566

Merged
simon-mo merged 201 commits intovllm-project:mainfrom
hmellor:transformers-v5
Apr 15, 2026
Merged

Update to transformers v5#30566
simon-mo merged 201 commits intovllm-project:mainfrom
hmellor:transformers-v5

Conversation

@hmellor
Copy link
Copy Markdown
Member

@hmellor hmellor commented Dec 12, 2025

Changes:

  • Update Transformers pin to 5.5.3
  • Update Tokenizers pin to 0.22.2 (as is required by Transformers 5.0.0)
  • Update PEFT lower bound to 0.18.1 so that huggingface/peft@41c07f0 is included (guards import of HybridCache on Transformers version)
  • Update Accelerate pin to 1.13.0 so that 4-bit bnb can work on Transformers v5
  • Update Mamba pin to 2.3.0 so that state-spaces/mamba@35e927b is included (removes import that was deleted in Transformers v5)
  • Update compressed-tensors to 0.15.0 as this is the earliest version that supports Transformers v5
  • Add HF_HUB_DOWNLOAD_TIMEOUT=60 to the CI environment to deal with the shortened timeout in huggingface-hub>=1 since it switched to httpx
  • Adds a backward compatbility tests that runs the same tests as "Transformers nightly", but with 4.57.5 installed

Some architectures/tests need to be skipped in order to get this upgrade through. We need this upgrade as it is blocking proper support of SoTA architectures released after Transformers v5. This is not a commitment to drop these architectures forever, simply a temporary measure. We plan to restore these architectures/tests following the upgrade.

Architectures/models that will no longer work after the upgrade:

  • Plamo2ForCausalLM - Custom model code uses _tied_weight_keys: list[str] but Transformers v5 now expects _tied_weight_keys: dict[str, str]
  • OpenCUAForConditionalGeneration - Custom code is not compatible with Transformers v5
  • OpenPanguVLForConditionalGeneration - OpenPanguVLVideoProcessorInitKwargs does not specify total=False, making all kwargs required
  • Alibaba-NLP/gte-Qwen2-1.5B-instruct - numerical issues with this model
  • PaddlePaddle/PaddleOCR-VL - imports deleted object
  • Custom tokenizer not compatible with Transformers v5:
    • InternS1ForConditionalGeneration
    • BAAI/bge-code-v1
    • XverseForCausalLM
  • Custom processor not compatible with Transformers v5:
    • Ovis2_5
    • Ovis2_6_MoeForCausalLM
    • MiniCPMO
    • MiniCPMV
    • Phi4ForCausalLMV
  • Custom config not compatible with Transformers v5:
    • InternLM2VEForCausalLM
    • HCXVisionForCausalLM
    • Tarsier2ForConditionalGeneration
    • SarvamMLAForCausalLM

Tests that are disabled after upgrade:

  • VLM tests for intern_vl, isaac, ultravox because these models are broken in Transformers v5 and therefore the HF reference cannot be generated
  • The following checkpoints because HF reference cannot be generated:
    • jinaai/jina-embeddings-v3
    • OpenGVLab/InternViT-*
    • InternVisionModel
    • jinaai/jina-reranker-m0
    • jinaai/jina-embeddings-v3
    • nvidia/NVIDIA-Nemotron-Parse-v1.1
    • ColQwen3

Supplementary PRs:

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
@hmellor hmellor added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 12, 2025
@mergify mergify bot added the ci/build label Dec 12, 2025
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to update the transformers library to version 5. The changes correctly update the version in requirements/test.in and requirements/nightly_torch_test.txt, and also add the --pre flag to uv pip install in the Dockerfile to allow installation of the release candidate. However, there is a critical oversight: requirements/common.txt still contains a constraint transformers < 5. This will lead to build failures for any configuration that relies on common.txt. This file must be updated to allow transformers v5 for this PR to be mergeable.

Comment thread requirements/nightly_torch_test.txt Outdated
Comment thread requirements/test.in Outdated
@hmellor hmellor marked this pull request as ready for review December 12, 2025 17:56
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

@hmellor hmellor changed the title update to transformers v5 Update to transformers v5 Dec 15, 2025
@hmellor hmellor linked an issue Dec 17, 2025 that may be closed by this pull request
1 task
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

Comment @cursor review or bugbot run to trigger another review on this PR

Comment thread requirements/nightly_torch_test.txt Outdated
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
@hmellor hmellor linked an issue Jan 27, 2026 that may be closed by this pull request
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
@hmellor hmellor requested a review from tjtanaa as a code owner January 27, 2026 23:32
@mergify mergify bot added the rocm Related to AMD ROCm label Jan 27, 2026
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Jan 28, 2026

Documentation preview: https://vllm--30566.org.readthedocs.build/en/30566/

@mergify mergify bot added the documentation Improvements or additions to documentation label Jan 28, 2026
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
hmellor and others added 6 commits April 14, 2026 10:48
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: khluu <khluu000@gmail.com>
Disable fused ops (VLLM_CPU_CI_ENV=0) for the untrained tiny-mixtral
model on CPU to reduce bfloat16 rounding that causes logprob divergence.
Also pass VLLM_CPU_ATTN_SPLIT_KV=0 to the CPU CI docker container.

Co-authored-by: jiang1.li <jiang1.li@intel.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Signed-off-by: khluu <khluu000@gmail.com>
@khluu
Copy link
Copy Markdown
Collaborator

khluu commented Apr 15, 2026

I believe we took care of all the CI failures with transformers v5 upgrade. Thanks @bigPYJ1151 for the CPU fix!

Running full CI again now: https://buildkite.com/vllm/ci/builds/61345 hopefully it's the last

khluu added 2 commits April 15, 2026 05:05
Signed-off-by: khluu <khluu000@gmail.com>
XVERSE tokenizer is incompatible with transformers v5 due to an
add_prefix_space / prepend_scheme mismatch in tokenizer.json that
causes loading to fail. Cap at transformers<=4.57 until upstream fixes.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Signed-off-by: khluu <khluu000@gmail.com>
@khluu
Copy link
Copy Markdown
Collaborator

khluu commented Apr 15, 2026

Claude's approach for basics model test (extra init 2)

▎ Skip XverseForCausalLM tests on transformers v5

▎ The XVERSE tokenizer (xverse/XVERSE-7B-Chat) is incompatible with transformers v5 —
▎ AutoTokenizer.from_pretrained fails with add_prefix_space does not match declared prepend_scheme
▎ due to a mismatch in the model's tokenizer.json. This is an upstream issue in the XVERSE tokenizer
▎ files, not in vLLM or transformers.

▎ Added max_transformers_version="4.57" with transformers_version_reason={"vllm": ...} so both
▎ test_registry_imports and test_can_initialize_large_subset skip this model on transformers v5.

khluu added 2 commits April 15, 2026 06:05
Signed-off-by: khluu <khluu000@gmail.com>
Move _get_lora_aux_cuda_stream, lora_linear_async, and the custom op
registration out of the `if envs.VLLM_LORA_ENABLE_DUAL_STREAM:` block.

The block was evaluated at import time, but test fixtures set the env
var via monkeypatch after import, causing NameError / AttributeError
when the runtime code tried to call these functions.  They are only
invoked when `_enable_aux_cuda_stream` is True (checked at runtime),
so defining them unconditionally is safe.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Signed-off-by: khluu <khluu000@gmail.com>
@khluu
Copy link
Copy Markdown
Collaborator

khluu commented Apr 15, 2026

Claude's fix for this test: https://buildkite.com/vllm/ci/builds/61345#019d8f38-a3e5-47f5-94aa-031e3b466e29/L3122

test_olmoe_lora — NameError: _get_lora_aux_cuda_stream is not defined (cb03f5d)

The _get_lora_aux_cuda_stream function, lora_linear_async, and the direct_register_custom_op call
were all inside an if envs.VLLM_LORA_ENABLE_DUAL_STREAM: block evaluated at import time. The test
fixture sets the env var via monkeypatch.setenv after import, so the names were never defined when
the runtime code tried to use them. Moved all definitions outside the conditional — they're only
invoked when _enable_aux_cuda_stream is True (checked at runtime in _init_lora_stream_context and
apply), so registering them unconditionally is safe.

@khluu
Copy link
Copy Markdown
Collaborator

khluu commented Apr 15, 2026

Claude's fix for step3 tool parser

  1. test_step3_tool_parser — spaces stripped in streaming tests (e187e72)

Transformers v5's LlamaTokenizerFast.init unconditionally replaces the pre-tokenizer from
tokenizer.json with Metaspace. For models like stepfun-ai/step3 whose tokenizer uses ByteLevel, this
causes spaces to be silently dropped. Added _restore_original_pretokenizer() in vllm/tokenizers/hf.py
that detects the mismatch and restores the original pre-tokenizer/decoder from tokenizer.json.

  1. test_minimax_tool_parser — trust_remote_code error (e187e72)

Transformers v5 now calls AutoConfig.from_pretrained internally during tokenizer loading. For
custom-code models like MiniMax, this requires trust_remote_code=True. Added the missing parameter to
the test fixture.

Wrap the get_config() call in get_tokenizer() with contextlib.suppress
so it gracefully handles paths that don't contain a config.json (e.g.
LoRA adapter directories passed as tokenizer paths).  The config
pre-registration is only needed for custom vllm configs and is
irrelevant for adapter or tokenizer-only paths.

Fixes test_quant_model_lora failure.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Signed-off-by: khluu <khluu000@gmail.com>
@khluu
Copy link
Copy Markdown
Collaborator

khluu commented Apr 15, 2026

Claude's fix for https://buildkite.com/vllm/ci/builds/61345#019d8f38-789f-401f-b021-183d4141b2f8/L2600

Fix test_quant_model_lora crash — LoRA adapter path passed to get_config() (cc19a1b)

The get_config() call added in get_tokenizer() (commit 8f551d0) pre-registers custom vllm configs
with AutoConfig before tokenizer loading. However, tokenizer_name can be a LoRA adapter directory
(e.g. jashing/tinyllama-colorist-lora), which doesn't contain a config.json — only
adapter_config.json. This causes get_config() to raise ValueError: Invalid repository ID or local
directory specified.

Wrapped the call with contextlib.suppress(ValueError, OSError) since the config pre-registration is
only relevant for models with custom vllm configs, not for LoRA adapter or tokenizer-only paths.

hmellor added 2 commits April 15, 2026 08:27
This reverts commit e187e72.

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
@hmellor
Copy link
Copy Markdown
Member Author

hmellor commented Apr 15, 2026

For step3_vl, the issue is that tokenizer_config.json explicitly sets the wrong tokenizer_class (it is not a LlamaTokenizerFast).

I've pushed 816db8b which uses TokenizersBackend instead. This class will determing what the tokenizer is based on the actual tokenizer.json and is significantly more reliable. We can use this mechanism for other checkpoints we find with the same issue.

The longer term solution is to upstream this override to Transformers, which I have done in huggingface/transformers#45449.

hmellor added 2 commits April 15, 2026 09:59
This reverts commit cb03f5d.

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
@hmellor
Copy link
Copy Markdown
Member Author

hmellor commented Apr 15, 2026

I've moved the fix for VLLM_LORA_ENABLE_DUAL_STREAM to the test code so that the behaviour in vLLM is unchanged

hmellor and others added 3 commits April 15, 2026 11:06
These models fail with `AttributeError: 'dict' object has no
attribute '__name__'` on transformers v5.2+.  Add
max_transformers_version="5.1" until upstream compatibility is fixed.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Signed-off-by: khluu <khluu000@gmail.com>
@khluu
Copy link
Copy Markdown
Collaborator

khluu commented Apr 15, 2026

full CI run: https://buildkite.com/vllm/ci/builds/61509
hopefully we can close this at exactly 200 commits xD

The processing test uses check_version_reason="vllm", so the skip
reason must be "vllm" not "hf" to actually take effect.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Signed-off-by: khluu <khluu000@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build cpu Related to CPU backends documentation Improvements or additions to documentation intel-gpu Related to Intel GPU multi-modality Related to multi-modality (#4194) nvidia qwen Related to Qwen models rocm Related to AMD ROCm tool-calling v1

Projects

Status: Done
Status: Done
Status: Done

Development

Successfully merging this pull request may close these issues.

Upgrade to Transformers v5 [Feature]: Support for transformers 5.2.0 Bump transformers to 5.0.0 [Feature]: Support transformers>=5

7 participants